HDFS scalability: the limits to growth

نویسنده

  • Konstantin V. Shvachko
چکیده

Konstantin V. Shvachko is a principal software engineer at Yahoo!, where he develops HDFS. He specializes in efficient data structures and algorithms for large-scale distributed storage systems. He discovered a new type of balanced trees, S-trees, for optimal indexing of unstructured data, and he was a primary developer of an S-tree-based Linux file system, treeFS, a prototype of reiserFS. Konstantin holds a Ph.D. in computer science from Moscow State University, Russia. He is also a member of the Project Management Committee for Apache Hadoop.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ceph as a scalable alternative to the Hadoop Distributed File System

[email protected] THE HADOOP D I S TR I BUTED F I L E System (HDFS) has a single metadata server that sets a hard limit on its maximum size. Ceph, a high-performance distributed file system under development since 2005 and now supported in Linux, bypasses the scaling limits of HDFS. We describe Ceph and its elements and provide instructions for installing a demonstration system that can be used...

متن کامل

High Scalability of HDFS using Distributed Namespace

In data intensive computing, Hadoop is widely used by organizations. The client applications of Hadoop require high availability and scalability of the system. Mostly, these applications are online and their data growth rate is unpredictable. The present Hadoop relies on secondary namenode for failover which slows down the performance of the system. Hadoop system’s scalability depends on the ve...

متن کامل

Using Hadoop as a Grid Storage Element

Hadoop is an open-source data processing framework that includes a scalable, faulttolerant distributed file system, HDFS. Although HDFS was designed to work in conjunction with Hadoop’s job scheduler, we have re-purposed it to serve as a grid storage element by adding GridFTP and SRM servers. We have tested the system thoroughly in order to understand its scalability and fault tolerance. The tu...

متن کامل

HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases

Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ sing...

متن کامل

A Novel Approach for Improving Security and Storage Efficiency on HDFS

Distributed file system for the storage of massive files have obvious advantages compared with the conventional file system. For instance, Hadoop Distributed File System (HDFS) implemented with commodity hardware has the advantages of low cost, high fault tolerance, scalability, etc. However, HDFS has the potential safety hazard due to the unencrypted data stored in Datanode, which may cause da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010